FOURIER BASED FAST MULTIPOLE METHOD FOR THE 
HELMHOLTZ EQUATION 
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Abstract. The fast multipole method (FMM) has had great success in reducing the computa- 
tional complexity of solving the boundary integral form of the Helmholtz equation. We present a 
formulation of the Helmholtz FMM that uses Fourier basis functions rather than spherical harmonics. 
By modifying the transfer function in the precomputation stage of the FMM, time-critical stages of 
the algorithm are accelerated by causing the interpolation operators to become straightforward ap- 
plications of fast Fourier transforms, retaining the diagonality of the transfer function, and providing 
a simplified error analysis. Using Fourier analysis, constructive algorithms are derived to a priori 
determine an integration quadrature for a given error tolerance. Sharp error bounds are derived and 
verified numerically. Various optimizations are considered to reduce the number of quadrature points 
and reduce the cost of computing the transfer function. 
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1. Introduction. Since the development of the fast multipole method (FMM) 
for the wave equation in [19, 5, 20, 10, 18], the FMM has proven to be a very ef- 
fective tool for solving scalar acoustic and vector electromagnetic problems. In this 
paper, we consider the application of the FMM to the scalar Helmholtz equation, 
although our results can be immediately extended to the vector case as described 
in [3, 7]. The application of the boundary element method to solve the integral form 
of the Helmholtz equation results in a dense linear system which can be solved by 
iterative methods such as GMRES or BCGSTAB. These methods require computing 
dense matrix-vector products which, using a direct implementation, are performed 
in 0(N 2 ) floating-point operations. The FMM uses an approximation of the dense 
matrix to perform the product in 0(N log N) operations. This approximation is 
constructed from close-pair interactions and far-field approximations represented by 
spherical integrals that are accumulated and distributed through the domain via an 
octree. 

There are a number of difficulties in implementing the FMM, each of which must 
be carefully considered and optimized to achieve the improved complexity. The most 
significant complication in the Helmholtz FMM is that the quadrature sampling rate 
must increase with the size of the box in the octree, requiring interpolation and anter- 
polation algorithms to transform the data between spherical quadratures of different 
levels of the tree. Local algorithms such as Lagrange interpolation and techniques 
which sparsify interpolant matrices are fast, but incur significant errors [16, 7]. Spher- 
ical harmonic transforms are global interpolation schemes and are exact but require 
fast versions for efficiency of the FMM. Many of these fast spherical transform al- 
gorithms are only approximate, complicated to implement or use, and not always 
stable [8, 13, 24]. 

In this paper, we use a multipole expansion which allows the use of 2D fast Fourier 
transforms (FFT) in the spherical coordinate system ((f), 9). The main advantages are 
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INotation 


Description 


K 


wavenumber, 2tt/\ with wavelength A 


ai 


box size at level Root: I = 0. Highest active level: 1 = 2. 


e 


polar angle 


<£ 


azimuthal angle 


c 


complex conjugate of c 


X 


vector in 1 , x = \x\ x 


x ■ y 


inner product, x ■ y = \x\ \y\ cos(f x ^ y ) 


S 2 


sphere, {s e M 3 : \s\ = 1} 


Jn 


spherical Bessel function of the first kind 


Vn 


spherical Bessel function of the second kind 


h W 
lln 


spherical Hankel function of the first kind 


Pn 


Legendre polynomial 


mi] 


nth coefficient of /'s Fourier series in x, f(x) = ^2 n J^[f] e mx 


Table 1.1 
Table of notations 



two fold: i) high performance FFT libraries are available on practically all computer 
platforms, resulting in accurate, robust, and fast interpolation algorithms; ii) the 
resulting error analysis is simplified and leads to sharp, a priori error bounds on the 
FMM. One of the difficulties in using FFTs is that we are forced to use a uniform 
distribution of points along <f) and 8 in the spherical quadrature. Naively, this leads 
to a much increased quadrature size for a given accuracy compared to the original 
spherical harmonics-based FMM. The reason is as follows. The multipolc expansion 
in the high frequency regime is derived from: 

—-T - / / e™ S r T Lxa {s) sin(0) d9 d0 
\r + r \ J =o J 0=0 

where s = [cos(<fi) sin^), sin(</>) sin(#), cos(#)] is the spherical unit vector. It is ap- 
parent that we are integrating along 9 a function which has period 2ir. However the 
bounds of the integral are to ir, over which interval the function has a discontinuity 
in its derivative. This results in a slow decay of the Fourier spectrum (essentially 
1/freq 2 ) of the integrand. Consequently, a large number of quadrature points along 9 
are required. 

We propose to use a variant of the scheme by J. Sarvas in [22] whereby the 
integration is extended from to 2ir and the integrand modified: 

p iK,\r+ra\ i i>27r p2-rr 

T——T = ~J \ e™ s - r T itro (s) |sin(0)| d9dcf> (1.1) 
\r + r Q \ 2J <p=o J 0=o 

We will describe in more details how an efficient scheme can be derived from this 
equation. The key property is that e lKS ' r is approximately bandlimitcd in 9 and 
therefore it is possible to remove the high frequency components of T^,, (s)| sin(#)| 
without affecting the accuracy of the approximation. Using this smooth transfer 
function, which is now bandlimitcd in Fourier space, the number of quadrature points 
can be reduced dramatically. We show that the resulting number of quadrature points 
is reduced by about 40% compared to the original spherical harmonics-based FMM. 
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es Consequently, we now have a scheme which requires few quadrature points and enables 

eg the use of efficient FFT routines. 

70 The approach in [22] is similar. However, rather than smoothing Te_ ro (s) |sin(#)| 

71 once during the precomputation phase as we detail in this paper, Sarvas instead 

72 incorporates the |sin(0)| factor during the run-time phase of the FMM after the appli- 

73 cation of the transfer function. Although a detailed analysis is required for accurately 

74 assessing the relative cost of the two approaches, the technique in [22] requires ap- 

75 proximately 1.5 times more sample points in the time-critical transfer pass of the 

76 algorithm, and requires an extra anterpolation step after applying T with about twice 

77 more sample points than used for the method in this paper. We also note that the 

78 error analysis for the two methods is different, and is easier to carry out with the 

79 approach in this paper. 

so We derive a new a priori error analysis which incorporates both effects from 

si truncation of the Gegenbauer series (a problem well analyzed [3]) and the numerical 

82 quadrature. Our algorithm to predict the error is very sharp. The sharp bounds 

83 allow the method to choose a minimal number of quadrature points to guarantee a 

84 prescribed error. By comparison, the conventional approach leads to less accurate 
as estimates resulting in either lower accuracy than requested or higher computational 

86 cost (over-estimation of the required approximation order). Although not consid- 

87 ered in this paper, our error analysis approach can also be applied to the spherical 
as harmonics-based FMM to yield similarly accurate error bounds. This has practical 
as importance since it allows guaranteeing the error in the calculation while reducing 
90 the computational cost. 



91 The novel contributions of this paper can be summarized as follows: 

92 • Development of an efficient Helmholtz multi-level FMM which uses FFTs in 

93 the inter/anterpolation steps while retaining diagonal transfer and translation 

94 functions. The use of FFTs allows leveraging high performance FFT libraries 

95 available for most machines, sequential and parallel. 

96 • An error analysis that accounts for all error in the method and yields con- 

97 structive algorithms to choose optimal method parameters. 

98 • Details of various optimizations to reduce the computational cost (e.g. use of 

99 symmetries in the precomputation of the transfer functions, use of symme- 

100 tries on the unit sphere for the inter/anterpolation steps, optimization of the 

101 quadrature points near the poles of the unit sphere) . 

102 • Pseudocodes are provided to clarify the method and help with an implemen- 

103 tation by the reader. 

104 • Demonstration of the sharpness of the error bound and the asymptotic com- 
los putational cost. 

106 The paper is organized as follows. In Section 2, we introduce the critical parts 



107 of the classical Helmholtz FMM including the Gegenbauer series truncation (2.1), 

108 the spherical quadrature (2.2), and a short overview of interpolation/anterpolation 

109 strategies (2.3). Section 3 details the Fourier basis approach. The transfer function 
no must be modified to lower the computational cost and obtain a competitive scheme, 
in as detailed in Section 3.2. Section 3.3 analyzes the integration error to derive an 
n2 algorithm which determines a quadrature with a prescribed error tolerance. The FFT 
n3 based interpolation and anterpolation algorithms are described in Section 3.4 and 
n4 numerical results are given in Section 3.5. Table 1.1 lists the notations used in this 
us paper. 
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lie 2. The Multilevel Fast Multipole Method. The FMM reduces the compu- 

u7 tational complexity of the matrix- vector multiplication 

na for i,j = 1, . . . , N from 0(N 2 ) to 0(N\ogN). This improvement is based on the 

us Gcgcnbauer series 



e «fc|r+r | 



= in ^(-l) n (2n + !)*#>(« |r |)j„(/s |r|)P„(r • f ) (2.2) 



^" + ^0, 

1 1 n=0 

120 The series converges absolutely and uniformly for |r | > ^ \r\ and has been studied 

121 extensively in [2, 6]. 

122 Truncating the Gegenbauer series at I and using an integral over the unit sphere, 

123 : 

o in\r+r \ 



e 



\r + r \ 



f e™ s - r T e , ro (s)dS(s)+e G 
Js 2 



124 where eg is the Gegenbauer series truncation error and the transfer function, X^ ro (s), 

125 is defined as 

£ 

Te,r (S) = £ E l ^ 2n + l ) h n ] ^ \ro\)P n {$ ■ *(>)■ (2.3) 

126 The reduced computational complexity of the FMM is achieved by constructing a 
12? tree of nodes, typically an octree, over the domain of the source and field points. We 
i2s recall the main steps of the FMM to set some notations. Let M l a (s) be the outgoing 

129 field for B l a , the box a of the tree in level I € [0, L] with center c l a . 

130 Initialization: The method is initialized by computing the outgoing plane-wave 

131 expansions for each cluster contained in a leaf of the tree: 

M t{s)= E ^ie" a - (a *- c£) 

i, Xi£B£ 

132 Upward Pass (M2M): These outgoing expansions are then aggregated upward 

133 through the tree by accumulating the product of the child cluster expansions with the 

134 plane-wave translation function: 

M * 1 (s) = 2 M l p{s)e lKS < cl e- c «^ l = L,L-l,...,3 

135 Transfer Pass (M2L): Incoming expansions, I l a (§) of box B l a , are computed from 

136 the outgoing by multiplication with the transfer function: 

I l a (M)= E M l p(s)T e ^_ cl Js) l = L,L-l,...,2 

0eI(B'J 

137 where I{B l a ) is the interaction list of box B l a , defined as all boxes of level I which are 
us not neighbors of B l a , but whose parent is a neighbor of the parent of B l a . 
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139 Downward Pass (L2L): The incoming expansions are then disaggregated downward 
«o through the tree to compute the local field L l a (s) of box B l a : 

L l +\s) = 4(S) e *--(4-<i +1 ) + I l + 1 (s) I = 2, 3, . . . , L - 1 

i4i where B 1 ^ 1 C Bj,. 

«2 Field Computation: At the finest level, the integration over the sphere is finally 

143 performed and added to the near-field contribution to determine the field value at the 

144 N field points: 

f r | *&i 3* J 1 | 

n - LH,)^-> dS(») + V i^^* 

us where a?i € -B^ and Af(B^) is the neighbor list of -B^, defined as B% and all neighbor 

He boxes of B% ■ 

147 2.1. Truncation Parameter in the FMM. The truncation parameter t must 

us be chosen so that the Gegenbauer series (2.2) is converged to a desired accuracy, 

us However, for n > x, j n (x) decreases super-exponentially while h„\x) diverges. The 

iso divergence of the Hankel function causes the transfer function to oscillate wildly and 

151 become numerically unstable. Even though the expansion converges, roundoff errors 

152 will adversely affect the accuracy if t is too large. Thus, while one must choose 

153 I > k \r\ so that sufficient convergence of the Gegenbauer series is achieved, it must 

154 also be small enough to avoid the divergence of the transfer function. The selection of 

155 the truncation parameter I has been studied extensively and a number of procedures 

156 for selecting it have been proposed [5, 6]. 

157 The excess bandwidth formula (EBF) is derived from the convergence of the 
us plane- wave spectrum as presented in [3] . The EBF chooses I as 

<a k\t\ +C(k\v\) 1/3 (2.5) 

159 An empirically determined common choice is C — 1.8(do) 2 ^ 3 , where d is the desired 

160 number of digits of accuracy. The EBF is one of the most popular choices to select 

161 the truncation parameter [12]. 

162 The actual Gegenbauer truncation error for a given t can also be approximated. 
«3 As Carayol and Collino showed in [2] , an upper bound of this error for large values of 
164 \r\ is obtained when P n {f ■ r ) = (±1)" so that 

oo 

(Tir(2n+l)/ i ( 1 )( K |r |) J „( K |r|) 

n=l+l 



(2.4) 



165 which they showed can be computed in closed form 



.2 \JW\ 

\r a \±\r\ 



h< e+A K \ r o\)j£(K\r\) ± h ( p(n\r a \)j e+1 ( K \r\) 



(2.6) 



This fails for small \r\ when the upper bound is obtained by choosing r ■ f such that 
the oscillation of P n (f-fo) compensates for the oscillation of (— l) n hn\n \ro\)j n (K |r|). 
Using the EBF as an initial guess for t and refining the choice using the above closed 
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W9 form when \r\ is sufficiently large is a simple algorithm which yields a nearly optimal 

170 value for £ (that is the smallest value consistent with the target error). This is the 

171 scheme we selected for this paper. 

172 Carayol and Collino in [1] and [2] present an in-depth analysis of the Jacobi- Anger 

173 series and the Gegenbauer series. They find the asymptotic formula 

174 where W(x) is the Lambert function defined as the solution to 

W(x)e w{x) =x x>0 

175 This appears to be near optimal for large box sizes. 

176 The errors introduced by this truncation have been investigated in other papers 

177 including [16, 2, 6]. 

178 2.2. Spherical Quadrature in the FMM. The error analysis is simplified if a 

179 scheme is used which exactly integrates spherical harmonics, Y™, up to some degree, 
wo The most common choice of quadrature uses uniform sample points in <f> and Gauss- 

181 Legendre sample points in z{9). With N + 1 uniform points in the 4> direction and 

182 ?l±l Gauss-Legendre points in the 9 direction, all Y™, — n <m<n,0<n<N are 

183 integrated exactly [7, 16]. 

184 2.3. Interpolation and Anterpolation in the FMM. The quadrature sam- 

185 pling rate depends on the spectral content of the translation operator, e JKS r . Its 

186 coefficient in the spherical harmonic expansion decreases super-exponentially roughly 
is? for n > k \r\. Therefore, as fields are aggregated in the upward pass and \r\ becomes 
las larger, a larger quadrature is required to resolve higher modes. These modes must 

189 be resolved since they interact with the modes in the transfer function, which do not 

190 significantly decay as £ increases. 

191 Similarly, as fields are disaggregated in the downward pass, |r| becomes smaller 

192 and the higher modes of the incoming field make vanishingly small contributions to 

193 the integral as a consequence of Parseval's theorem. Thus, as the incoming field 

194 is disaggregated down the tree, a smaller quadrature can be used to resolve it. This 

195 makes the integration faster and is actually required to achieve an optimal asymptotic 

196 running time. See Table 2.1. 

197 There have been several approaches to performing the interpolation and anter- 

198 polation between levels in the FMM. Below, we enumerate a number of options that 

199 have previously been studied. 

200 General, local interpolation methods like Lagrange interpolation, Gaussian inter- 

201 polation, and B-splincs are fast and provide for simple error analysis [16]. 

202 A spherical harmonic transform maps function values sampled at (4>k,@k), to 

203 a new quadrature {<j>' k , ,0' k ,), via the linear transformation 

f«= E Y i m > e 'k>) E ^Yr(h, o k )f k = E A**/* (2-7) 



204 This results in an O(NlogN) or C(iV 3 / 2 ) FMM (see Table 2.1). Fast spherical 

205 transforms (FST) have been developed in [8, 13, 24, 21] and applied to the FMM in [4]. 

206 Using the FST reduces the interpolation and anterpolation procedures to 0{K log K), 
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207 which results in an 0(N log 2 N) FMM. However, the accuracy and stability of these 

208 algorithms remain in question. 

209 Approximations of the spherical transform have also been investigated in [14, 7]. 

210 The interpolation matrix A^k in (2-7) can be sparsified in a number of ways to provide 
2n an interpolation/antcrpolation method that scales as 0(K) with controllable relative 

212 error. Many other interpolation schemes exist with varying running times and errors. 

213 Rokhlin presents a fast polynomial interpolator based on the fast multipole method 
2H in [9]. See also [15]. 

215 The asymptotic computational complexity in the big-0 sense is summarized in 

216 Table 2.1. 

Interpolation Volume Surface 

Direct TV log N TV 3 / 2 

Fast Global N N log 2 N 

Local N N log N 

Table 2.1 

Computational complexity in the big-O sense. Column 2 and 3 refer to the distribution of 
particles. We assume that N = C((Kao) 3 ) for a volume of scatterers and N = O((reao) 2 ) for a 
surface of scatterers. Direct refers to a computation of the operator with no acceleration (basi- 
cally several matrix vector products). Fast Global refers to fast spherical harmonic transforms or 
fast Fourier transforms (this work). Local are methods based on local interpolation such as using 
Lagrange interpolation. 



21? 3. Fourier Based Multilevel Fast Multipole Method. The Fourier based 

218 fast multipole method is based on the identity 

/ e lKS - r T e . ro (s)dS(§)= E r (0,^Tl ro (e,^d ( f>d6 (3.1) 

Js 2 Jo Jo 

219 where the translation function E r {9, 4>) and the modified transfer function Tf ro {9, <f) 

220 are defined as 

E r (6, 4,) = e™ S r iy, ro (M) = ^ 2i.ro (S) |sin(0)| (3.2) 

221 with s = [cos(c/>) sin(0), sin(0) sin(#), cos(0)] and T^ ro {s) is the transfer function de- 

222 fined in Equation (2.3). The J s2 representation is common and implied throughout 

223 the discussion in Section 2. The natural basis for integration on the sphere is the 

224 spherical harmonics Y™ which form an orthonormal basis of L 2 (S 2 ) and the FMMs 

225 of Section 2 attempt to preserve this basis expansion in the upward and downward 

226 pass. The representation suggests the use of the Fourier functions {e'^e"™ 9 } 

227 which form an orthonormal basis of i 2 ([0,27r] x [0,27r]). Doubling the sphere to use 

228 Fourier methods is presented in a more general manner in [22, 23]. 

229 Using a Fourier basis rather than a spherical harmonics basis allows i) using 

230 two dimensional uniform quadratures; ii) fast Fourier transforms in the interpolation 

231 and anterpolation steps; and hi) spectral analysis in the error estimates. Of these 

232 advantages, the most important is that the FFT interpolations and anterpolations are 

233 fast and exact. Since there is no interpolation error, the only significant contributions 

234 to the final error are the truncation of the Gegenbauer series and the integration 

235 error due to the finite quadrature. Thus, the error analysis is simplified and we will 

236 determine in this paper precise bounds on the final error. In fact, our error analysis 
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237 is fairly general and can be extended to the classical FMM with schemes that exactly 

238 integrate spherical harmonics (see direct and fast global methods in Section 2.3). The 

239 result is a fast, easy to implement, and controllable version of the FMM, which we 

240 detail in the following sections. 

241 3.1. Fourier Quadratures. 

242 3.1.1. Spherical Fourier. A periodic complex function defined over [0,27r] x 

243 [0, 2tt] with spherical symmetry, 

/(0,0) = /(27r-M + $ (3-3) 

244 has the trigonometric polynomial representation 

oo oo 

/(M)= E E Jn..me< n0+m ^ (3.4) 

n— — oo m— — oo 

245 where the symmetry condition (3.3) is equivalent to 

7n,m = (-l) m f-n, m Vn, 171 (3.5) 

246 If functions / and g can be represented exactly by trigonometric polynomials of 
24? degree (N$, N^,) then their L 2 inner-product can be computed exactly as 

(/,<?>=/ / f{9^)g{9^)d9d<t ) ^An 2 V V f n , m g^ 
Jo Jo ~ 

7i— — No m=—N ( f ) 

4^2 



)g(9 n ,<j> m ) 



v n=l m=l 



248 where 



N e = 2N e + 1 
AU = 2A0, + 1 



2nn 

A^ 
2irm 



249 If is made even (by padding the Fourier coefficients with an extra zero), then 

) (3-6) 

250 so that only half of the sampled values must be stored. 

251 3.1.2. Integration with Fourier Filtering. A key concern in using the 2D 

252 Fourier basis rather than the spherical harmonics in Equation (3.1) is the integration 

253 weight | |sin(0)|. Although the transfer function, Ti ro is bandlimited in 9 and <f> the 

254 modified transfer function, T% is not bandlimited in 9 due to the integration weight. 

255 In this section, we review a common strategy for integrating functions that are not 

256 bandlimited against functions that are bandlimited or nearly bandlimited. 

257 Consider the periodic functions 

oo oo 

/(*)= E & e ' nfl 9(0)= E 9 m e m6 
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Note that their exact integral on [0, 2ir] is 

(f,g)= / f(e)g(e) ae = 2n E f~ 9n 

Computing this integral numerically with a uniform quadrature of size K yields 

~ K oo oo K 

(f,g)K = ^£m)^) = ^ E E /*E e2W 



(3.7) 



71 — — OO 711 — — OO 

OO oo 



fc=l 



^ E E /« 

71 — — OO 711 — — OO 



So the error is exactly given by 
K/,S>-</,S>*-|=27r 



E E/nSn 

n=— oo m^O 



traif 



27T 



E E f™+™i<gn 

n=— oo m/0 



261 
262 
263 
264 
265 
266 
267 
268 



The error is therefore determined by the asymptotic decay of both /„ and g n . For 
example, if |/„| decays as l/n a , a > 1, then \f n Kgo\ ~ which is the 

typically expected decay of the error on a uniform quadrature as a function of K. 
The equivalent result holds for g n . Thus, if the spectrum of either / or g decays 
slowly, a very large quadrature K is needed. 

However, if the spectrum of / decays slowly and the spectrum of g decays quickly 
a much smaller quadrature can be used by truncating the Fourier coefficients of /. 
Let us define a bandlimited version of f(9) 



N 



fN(9)= E /« e 



1710 



(3.8) 



n=-N 



269 and use a uniform quadrature of K points to compute the integral. Then 

N oo 

(fN,g)K = 27T E E 

n-\-7nK 

n— — N 771— — oo 



270 which yields the error 

\(f,g)-(fN,g) K \ = 2w 



N 



E ~ E E f n 9 

n-\-mK 

\n\>N n=-iVm/0 



(3.9) 



271 This error can be made small by requiring only that g n decays quickly enough. That is, 

272 \fn\ can decay slowly provided \g n \ decays quickly. The first term of (3.9) represents 

273 the truncation error in using the bandlimited /jy instead of /. The second term 

274 represents the aliasing error resulting from the finite sampling of the function g. As 

275 an example, if we assume that g n is negligible for \n\ > N then it is sufficient to 

276 choose K = 2N + 1 so that \n + mK\ > N for all -N < n< N and m ^ so that 
2 " g n +mK is always negligible. 



10 



C. CECKA AND E. DARVE 



278 Detailed numerical results will be presented in Section 3.5. However, to demon- 

279 strate this idea on a simple example, we build a quadrature to calculate: 

/ |sin0| cos(64cos(6>)) d6 
Jo 

2so which is a simple model problem for [see Equation (3.1)] 

/ T,, ro (s)|sin0| e lK *- r d9 
Jo 

281 when r = 64/ k z (z-axis). Using a uniform quadrature to compute this integral yields 

282 slow convergence in 1/N 2 because the function f{9) = |sin0| is C° (see Figure 3.2). 

283 Instead, if we smooth |sin0| and remove the high-frequency components following 

284 Equation (3.8), we obtain a much faster convergence (see Figure 3.2). The number 

285 of quadrature points is K = 2N + 1. The Fourier spectrum of g(9) = cos(64 cos(0)) 

286 decays rapidly once \n\ > 64 (see Figure 3.1). Convergence should occur once N > 64 

287 and this is indeed what is observed in Figure 3.2. The exact value of the integral 

288 sin(64)/16 is used as the reference solution to calculate the error. 




-120-100-80 -60 -40 -20 20 40 60 80 100 120 



Fourier frequency n 

Fig. 3.1. The spectrum of g{6) = cos(64 cos(6>)) showing a rapid decay of the coefficients for 
\n\ > 64. The Fourier spectrum was computed using 256 sample points. 

289 The idea of accurately calculating the integral of a product of two functions by 

290 analytically removing high frequencies in one of the two functions can be found in 

291 other papers dealing with the fast multipole method for the Helmholtz kernel in the 

292 high frequency regime, e.g. [7, 6]. In the context of these papers, the smoothing 

293 operation (removal of high frequencies) is often termed anterpolation or subsampling. 

294 A similar idea is found in Sarvas et al. [22]. In McKay Hyde ct al. [17] (Appendix A, 

295 p. 254-257), this idea is used in the more general context of calculating the integral 

296 of the product of a discontinuous function with a C 1 piecewise-smooth and periodic 

297 function. 

298 3.1.3. Fourier Interpolation and Anterpolation. Fast Fourier interpolation 

299 and anterpolation methods are key to creating an efficient Fourier based FMM by 

300 supersampling trigonometric polynomials and truncating spectral content that does 

301 not significantly contribute to the final result. A Fourier interpolation pads the Fourier 

302 coefficients of a function with zeros and increases the sampling rate in real-space. 

303 A Fourier anterpolation removes high frequencies of a function and decreases the 

304 sampling rate in real-space. 
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Fig. 3.2. Relative error in the integral as a function of N . The number of quadrature points 
is K = 2N + 1. The blue line integrates the function using a Fourier filter for \ s'm(8)\ as shown 
in Equation (3.9). The rapid convergence can be seen after N > 66. The trapezoid method has a 
slower convergence in 1/K 2 because |sin9| is only C° . 



305 To motivate the use of Fourier interpolations and anterpolations in the Hclmholtz 

306 MLFMM, suppose that the spectrum of Jn{8) is bounded, that is |/„| < F, and that 

307 K = 2N + 1, then (3.9) simplifies to 



\(f,g)-(fN,g)K\<4*F 



9r. 

\n\>N 



(3.10) 



This can be used to find an appropriate truncation parameter N if g n is known or 
can be approximated. The key step to constructing a fast algorithm is to note that 
the Fourier series of E r (0, 4>) in decays rapidly while the Fourier series of T* rg (6, <fi) 
in 8 decays slowly. This is due to the slow decay of the Fourier series of |sm(0)|: 



•7*[|sin(0)|] 



(-!)" + ! 
7r(l-n 2 ) 



— r if n even 
if n odd 



The spectrum of the plane-wave is given by 



ik\v\ cos((p§ r ) 



(3.11) 



n— — oo 



where ipg t r is the angle between s and r. The functions J n decay rapidly once n > k \r\ 
resulting in an approximately band-limited function. In the following sections, we use 
the spectral decay of the translation function E r (6,4>) (which plays the role of g in 
the previous section) to determine an appropriate Fourier truncation of the modified 
transfer function, r/ ro (0, 4>) (which plays the role of /). 

Equation (3.10) illustrates the need for interpolation and anterpolation as de- 
scribed in Section 2.3 but in the context of the Fourier basis. With g n — F^E^^] = 
t n J n (K,\r\), the truncation N is approximately N > n\r\ ~ £. During the upward 
pass, \r\ increases and we require more modes in / = T^ ro . During the downward 
pass, the incoming local field, L l in Section 2, is to be integrated against translation 
functions of increasingly smaller \r\. See Equation (2.4) with r — — Xi. The high 
modes of the field do not significantly contribute to the exact integral (3.7) so the 
field can be safely anterpolated at each downward step. 
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Fig. 3.3. The center of each box represents one transfer vector ro which must be computed. The 
pictures on the left and right panels represent the same set of boxes viewed under two different angles. 
Due to the symmetries of the quadrature, we need only compute transfer vectors with x, y, z > and 
x > y. We therefore end up with essentially half of an octant. Specifically, 34 transfer vectors are 
required; they can be reflected into any of the 316 needed. 



326 3.2. Computing the Bandlimited Modified Transfer Function. Select a 

327 2D uniform quadrature with points (8 n , <p m ) defined by 

Inn 2imi 



32a Noting that the plane wave E r {9,qb) and modified transfer function Tf ro (9,<j)) 

329 both have spherical symmetry (3.3), the computational and memory cost are reduced 

330 by requiring N<j, to be even so that only half of the quadrature points need to be 

331 computed and stored. 

332 Additionally in an FMM with a single octree, there are 316 distinct transfer 

333 vectors ro per level. By enforcing symmetries in the quadrature, the number of 

334 modified transfer functions that must be precomputed is reduced. Specifically, by 

335 requiring Ng to be a multiple of 2 and N$ to be a multiple of 4, we enforce reflection 

336 symmetries in the z = 0, x = 0, y — 0, x = y, and x = — y planes. This reduces 

337 the number of modified transfer functions that need to be precomputed from 316 per 

338 level to 34 - saving a factor of 9.3 in memory and costing a negligible permutation of 

339 the values of a computed modified transfer function. See Figure 3.3. 

Suppose we have chosen a quadrature characterized by (Ng,N^). Following Sec- 
tion 3.1.2, we need to exactly calculate a bandlimited version of Tf , called T*'^, 



S40 



341 



342 such that: 

343 Since T^ ro is bandlimited in 9 with bandwidth 21+1, only the frequencies |m| < 

344 Ng/2 + £— 1 of |sin(0)| contribute to T%' . Therefore, the exact bandlimited modified 

345 transfer function, T/'^, can be computed using the pseudocode in Algorithm 1. 
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! s n <- 7*[|sin(0)|] = for all \n\ < N e /2 + , 

2 for <j> m , m = 0, ... , N^/2 - 1, do 

3 T(6 n , <j> m ) <- ±T,, ro (|^, m ), n = 0, . . . , 21; 

T n <- J 7 ® [T] ; 



348 
349 
350 
351 
352 
353 



i; 



T*' L <— s <g> T convolution of Fourier series; 
T*' L <— truncate to frequencies \n\ < Ne/2 — 1; 



T^ L {9 n , 4> m ) <- inverse transform of T^ L ; 

Algorithm 1: Pseudocode to compute the bandlimited modified transfer function, 
T 5 ' L , given e, r , Ng, and AT . 

Algorithm 1 yields the bandlimited modified transfer function at (9 n ,(p m ), < 
n < Ng, < m < N^/2 which can be unwrapped to the remaining points by using 
the spherical symmetry (3.6). Note that this calculation can also be performed in 
real-space. It is equivalent to making a Fourier interpolation of Tk from 2£+l points 
to Ne + 2£ — 1 points, multiplying by a bandlimited |sin(#)|, and performing a Fourier 
antcrpolation back to Ng points, as shown in Figure 3.4. 



Precomputation 



|sin(0)| 



sample 



freqs \n\ < Ng/2 + £ - 1 



2£+ 1, 2£+ 1 
FFT Interp 9 

J 

N 6 +2l-\,2l+\ 



\ 




N B + 2£-l 







multiply 







Ng + 2£ - 


1,2£+1 







FFT Anterp 9, 



Ng, 




sample 



Fig. 3.4. Procedure for precomputing the bandlimited modified transfer function and its appli- 
cation to an outgoing field. The boxed numbers give the numbers of quadrature points for 8 and <j> 
(Ng and N<p) at each stage. 
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354 Because sampling the transfer function at a single point is an 0(£) operation 

355 and N,p,Ng e 0(£), the algorithm as presented is 0(£ 3 ). The computation of the 

356 transfer function at all sample points can be accelerated to 0(£ 2 ) as in [11] by taking 

357 advantage of its symmetry about the r axis and using interpolation algorithms, but 

358 at the cost of introducing additional error. 

359 3.3. Choice of Quadrature. The quadrature parameters can be constructively 

360 computed by determining the maximum error they incur. The error in computing the 

361 desired integral using the bandlimited modified transfer function with a finite uniform 

362 quadrature is 



r2ir r2w a 2 N <> N * 

/ / E r (6, <j>) T[ ro {8, <j>) d^A8 - — — V V E r (0 n , </> m ) T* r L o (8 ni( f>m) 

(3.12) 

363 where Tp^ o {6 n ,(j} m ) is the bandlimited modified transfer function described in Scc- 

364 tion 3.2. 

365 3.3.1. Choosing Ng. The worst-case for £/ in terms of Ng occurs when r and 

366 r arc aligned with the z-axis. This causes all spectral information to be contained in 
36? the ^-direction and makes £/ a function of Ng only. Integrating <fi out of (3.12) yields 



2tt 



r-2-K r, Ng 

/ E ±]r]£ (6, 0) T[ lral£ (8, 0)d8-—Y / E ±]r]2 (e n , 0) T^;^ £ (0 n ,O) 

J ° 9 n=l 



368 which is equivalent to the ID case considered in Section 3.1. Retrieving the Fourier 

369 coefficients of the plane wave in the case f = z from (3.11) and numerically com- 

370 puting exactly the low frequencies of the modified transfer function as described in 

371 Section 3.2, then (3.9) leads to 



£ T„W T „( K |r|)- ]T J2 T n* n+mNeJ T(n + mN e) (K\r\) 
\n\>Ng/2 \n\<N e /2m^0 



372 where = F^\T^ r ^ ~(0, 0)]. Due to the very fast decay of the Bessel functions, we 

373 find it is sufficient to apply the triangle inequality and keep only the lowest order 

374 Bessel function terms: 

oo 

|e?|<4^ 2 |MiW«M)l ( 3J3 ) 

n— — oo 

375 where 

* w ,„)- {f« -n w<^ (3,4, 

y\n\ \n\ > Ng/2 

376 Equation (3.13) can be used to search for a value Ng via Algorithm 2. 

377 Since 7V™ aa: in Algorithm 2 is typically only a small constant larger than 2£+l, the 

378 algorithm as presented is dominated by the computation of the 0{£) modified transfer 

379 function values and requires 0{£ 2 ) operations. Important optimizations include using 

380 more advanced searching methods (such as bisection) , applying the symmetries E* t = 

381 E*_ m and T m — Tl m , and taking advantage of the very fast decay of J n to neglect 

382 very small terms in the dot product. 
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1 Choose N™ ax sufficiently larger than 2£+l; 

2 Tn <- ^,,(^,0), n = 0,. . ., JVT* - 1; 

3 T„^|^[T]|; 

4 £„ «- |J n («;|r|)|; 

5 for N 8 from 21 to N™ ax by 2 do 

r if • T < e/4ir 2 then 
8 |_ return A/g 

Algorithm 2: Pseudocode to compute Ag given £, |r |, and |r|. 



3.3.2. Choosing N^. After determining an appropriate Ng, letting be a 
function of allows reducing the number of quadrature points without affecting the 
error. The worst-case for the integration error in terms of occurs when r and r 
are in the xy-plane. Without loss of generality, suppose r = x. Then, for a fixed n , 
the plane- wave (3.11) can be expressed as 

oo 

£V(0n,0) = e Mt ' (fl ~*>- r = E i m J m (K\r\sm(0 n ))e lm * (3.15) 

Ttl — — OO 

Since J m (^K |t| sin($ n )) is exponentially small when m > k \r\ sin(0„), the series can be 
truncated at N^(9 n ) <~ k \r \ sin(0„) without incurring any appreciable error provided 
that the exact Fourier coefficients of T*'^ g (9,<f>) in <j> are available. This is the case 

since T%'^ is bandlimited in <j>. Additionally, letting AT^ be a function of requires a 
final step in the computation of the modified transfer function. Section 3.2 computed 
the transfer function on a Ng/2 + 1 x grid. With N$ — > N<p(9 n ), the data computed 
for each n must be Fourier anterpolated from length to length N^,(9 n ). 

Estimates of N ( p(9 n ) can be developed by determining when J m (n \r\ sin(0 n )) 
becomes exponentially small, as in the computation of the EBF in [3]. However, we 
find that the EBF generated quadrature typically overestimates the sampling rate. 
To accurately compute N^{9 n ) a similar procedure to that in Section 3.3.1 is applied. 
After determining the appropriate Ng, the T|f^ can be computed. For a given 9, the 



400 error is 

r ** 2tt N * (0) 

/ E ±H& (9, 4>) T*;± lrol& (9, <t>) dc/) - — — E ±\r\x(0, <t>rn) T t 
Jo ^<t>\°) _ =1 



,'±|r |*(^' 



401 where T s LL is the bandlimited modified transfer function with both the ^-frequencies 

402 and (^-frequencies truncated for the quadrature. That is, if 

1 oo i 

I/,r (M) = ^,r (M) |sin(0)| = E E K m e n6+m4> 



n— — oo m- 



then 



Ne/2-l I 

T tr ^A) = E E Kme n9+m4 

n=-N g /2+1 m=-l 
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and 



N e /2-l Nt(0 n )/2-l 

^(M) = E E 

n=-N g /2+l m=-A^(0„)/2+l 



T~ e 



n9+m<p 



405 We again apply the result of Section 3.1 by retrieving the Fourier coefficients of the 

406 plane wave in the case f = x from Equation (3.15) and computing exactly the low 
40? frequencies of the modified transfer function. Equation (3.9) leads to 



<2^ E \ T ™ L (°) \MNm,m)^\r\sm(9))\ 



m=-t 



(3.16) 



4oa Equation (3.16) can then be used to search for a value of N ( f ) (9 n ) via Algorithm 3. 



1 Choose N™ ax sufficiently larger than 2£ - 

2 for 9 n ,n = 0,.. .,N e /2, do 



0,...,2t 



T„ 

T m ^\Ft[T]\- 

\r\ sin(0 n ))|; 
for N49 n ) from 4 to NJ ax by 4 do 

if E* ■ T < e/An 2 then 
L Save N^(9 n ) 

Algorithm 3: Pseudocode to compute each N^(9 n ) given £, \r \, \r\, and N$. 



409 Since N™ ax is only a small constant larger than 2£ + l, the algorithm as presented 

410 is dominated by the computation of the modified transfer function and requires 0(£ 3 ) 
4n operations. Optimizations similar to those presented in Section 3.3.1 can be applied. 

412 Using the EBF as an initial guess in the search for N^(9 n ) further improves the 

413 searching speed. Additionally, only half of the N ( f ) (9 n ys may be computed due to 
4« symmetry about the z — plane. 

415 3.3.3. Choosing \r\ and |t*o|. The previous algorithms require representative 

416 values of \r\ and |t"o| for each level of the tree. The worst-case transfer vectors, r*o, 

417 are those with smallest length. If ai is the box size at level I, then |ro| = 2a; is the 

418 smallest transfer vector length in the common one buffer-box case. 

419 The worst-case value of |r| is the largest. For a box of size ai, \r\ < aiV3. 

420 However, using |r| = ai^J~Z in the previous methods is often too conservative. This 

421 case only occurs when two points are located in the exact corners of the boxes - a 

422 very rare case. See Figure 3.5. Instead, we let \r\ — aai V3 for some a € [0, 1]. A 

423 high a guarantees an upper bound on the error generated by the quadrature, but the 

424 points which actually generate this error become increasingly rare. A lower value of 

425 a will yield a smaller quadrature, but more points may fall outside the radius \r\ for 

426 which the upper bound on the error is guaranteed. 

427 3.3.4. Number of Quadrature Points. Recall from Section 2.2 that the typ- 

428 ical approach in the FMM is to use N + 1 uniform points in the <f> direction and ^±± 
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Fig. 3.5. The worst-case r and ro, projected from the 3D box. Here, \ro\ 
are on the opposite corners of the box so that \r\ = |r; c | + |r CJ | = a;i/3- 



2ai and i and j 



429 Gauss-Legendre points in the 9 direction so that all Y™, —n < m < n, < n < N 

430 are integrated exactly. In [16], Chew et al. takes = I + 1, which is an approx- 

431 imate choice based on the rapid decay of the coefficients in the spherical harmonics 

432 expansion of a plane- wave. This results in approximately 



433 
434 
435 



436 
437 



438 
439 
440 
441 

442 
443 
444 
445 
446 



K sh = 2(£+1) 2 



2t 



quadrature points. 

For a given Gegenbauer series truncation £, the total number of quadrature points 
required in the Fourier based FMM is approximately 

No i r 

2 n Jo 

«(* + Ci)-/ (2£ + C 2 (6))sm(e)d0 
n Jo 

where C\,Ci > 1 are small integers dependent on £, numerically computed from the 
methods in Section 3.3.1, 3.3.2. Keeping only the leading term in I: 



K 



fb 



4 

7T 



1.3 1 



Thus, the method presented in this paper uses approximately 0.64 times the number 
of quadrature points in the standard FMM. However, it is possible that the same 
optimization can be applied to the standard FMM for the same reasons it was applied 
in Section 3.3.2 to reduce the standard quadrature to a comparable size. 

3.4. Interpolation and Anterpolation. Most importantly, the Fourier based 
FMM directly uses FFTs in the interpolation and anterpolation steps. This makes the 
time critical upward pass and downward pass efficient and easy to implement while 
retaining the exactness of global methods. 

Characterize a quadrature by an array of length Ng/2 + 1, 

K = [N+(6 ), N^), . . . , A^(0jv e /2-i), N+(0 Nt/2 )] 

44? noting that N<p(9 n ) = N<p(9 Ng /2+ n ). The data -F(#„,0 m ) sampled on a quadrature 

448 K is transformed to another quadrature K 1 by performing a sequence of Fourier 

449 interpolations and anterpolations. Let 



M 6 



max 



max NA6 n ), max N'A6 n ) 

0<n<Ng/2 0<n<N' e /2 v 



450 Then, the following steps, as illustrated in Figure 3.6, perform an exact interpola- 

451 tion/anterpolation using only FFTs. 
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452 1. For each n , < n < N /2, Fourier interpolate the data [F(6 n , ^m)}^^"^ 1 

453 from length N ( p(9 n ) to N<f,. 

454 2. For each <p m , < m < Af<p/2, apply symmetry (3.6) to construct the 9- 

455 periodic sequences [F(9 n , <p m )]n=o 

456 3. For each </> m , < m < Fourier interpolate the data [F(9 n , (j> m )\n=o 1 

457 from length Nq to N' e . 

45a 4. For each 9 n , < n < N' g /2, apply symmetry (3.6) to construct the 0-periodic 

459 sequences [F{9 n ,(t> m )] m 4 L 1 . 

460 5. For each 9 n , < n < Ng/2, Fourier anterpolate the data [F(9 n , cf) m )] J ^ l ' l L 1 

461 from length A/0 to N'A9 n ). 



462 A slightly more efficient algorithm uses the symmetry (3.5) rather than (3.6). 




Fig. 3.6. The data profile at each step in an anterpolation from a large quadrature K with 
Ng = 30 to a smaller quadrature K' with N' g = 24. The data corresponding to a pole has been 
darkened for clarity. 

463 3.5. Numerical Results. In the following sections, we will use the total mea- 

464 sured error, St, defined as 

«|r+r | 4lr 2 N ^ 

* = yT^-ZNWM ? Zr(0 M Tlt(0 M (3.17) 

1 1 n=l ^ y ' m=l 

465 The total Gegenbauer truncation error, eg, is 

iK|r+r | 

£G = \r + r a \ ~ JK E(- 1 )"( 2n + l ) h nX* ko|)j'n(« \r\)Pn(r ■ f ) (3.18) 

n— 

466 The total integration error e/ is 

e I = e T -e G (3.19) 

46? 3.5.1. Single-Level Error. In the first test case, shown in Figure 3.7, we exam- 

468 ine the choice of Ng by using the worst case ro = |ro| z. Here, the optimal Gegenbauer 

469 truncation I is obtained as explained in Section 2.1 [see Equation (2.6)]. The quadra- 

470 ture for computing the integral (3.1) was constructed as described in Section 3.3. 

471 With box size a = 1, the quadrature and Gegenbauer truncation are constructed with 

472 |r| = 0.8a^/3, |r | = 2a, and target error e = 10~ 4 , 1CT 8 . The plotted errors represent 

473 the maximum found over many directions f, verifying that the worst case is r ~ ±z. 

474 We note that the target error e is accurately achieved, for all frequencies (except 

475 in the low- frequency breakdown because of round off errors). The actual error is 

476 within a factor of 2 or less of the target error. The increase in error for small box 
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sizes corresponds to the low frequency breakdown when the transfer function has very 
large amplitude and roundoff errors become dominant. In this regime the quadrature 
target error bound can also be relaxed to improve efficiency — it is inefficient to have 
a large quadrature that provides a small integration error when the transfer function 
cannot provide comparable accuracy. 

We also show a comparison with a simple heuristic for choosing Ng. Based on 
the truncation of the Gegenbauer series, we expect Ng = 21 + 1 (rounded up to the 
nearest multiple of 2) to be a reasonable guess. The error resulting from this choice 
is shown by the curve labeled ef^. This shows that our scheme produces a more 
accurate estimate of this parameter and, in particular, with our scheme, we no longer 
see the drift in error seen with ef^. 




10 



100 



1,000 



10,000 



|£g| 

M 

\st\ 

ebf | 



e 



Fig. 3.7. Results with ro ~ z, target accuracy e = 10 4 and e = 10 8 , the Gegenbauer 
truncation chosen as in Section 2.1, Ng chosen as in Section 3.3.1, and box size a = 1. The ef 1 ^ 
curve corresponds to using the EBF along with Ng = 21 + 1; this choice leads to a drift in the 
error as the frequency increases. In contrast, using our error estimate and scheme to choose the 
parameters, the error follows closely the target error. 



4sa The second test case, shown in Figure 3.8, shows the accuracy resulting from 

489 our choice of N$ using the worst case r = |i*o| x. Again, the direct computation was 

490 used to find the optimal Gegenbauer truncation I and the quadrature was constructed 

491 following Section 3.3. With box size a = 1, the quadrature and Gegenbauer truncation 

492 are constructed with \r\ = 0.8aV3, \ro\ — 2a, and target error e — 10~ 4 and 10~ 8 . 

493 The plotted errors represent the maximum found over many directions r, verifying 

494 that the worst case is r ~ ±x. We can see that the target error is achieved even more 

495 accurately than the z case. This is due to the number of iV^ that are chosen — one 

496 for each 8 n . The A^(^ n ) that yields the most inaccurate result dominates the others, 

497 bringing the integration error very close to the target. The comparison heuristic is 

498 ef^ uses the constant N,p = 21 + 1 (rounded up to the nearest multiple of 4) . Again, 

499 the target error should be relaxed in the low frequency breakdown regime. 

500 Figure 3.9 shows the ratio of the number of points in the quadrature used in 

501 Figure 3.8 to the number of quadrature points that would be used in a typical spherical 

502 harmonics based FMM using the same Gegenbauer truncation £ chosen by the direct 

503 calculation. The procedures presented in this paper result in a quadrature which is 

504 substantially smaller than what would typically be used. Notably, the analysis in 



20 



C. CECKA AND E. DARVE 




Fig. 3.8. Results with ro ~ x, target accuracy e = 10 and e = 10 , t/ie Gegenbauer 
truncation chosen as in Section 2.1, Ng and Na, chosen as in Section 3.3.1 and 3.3.2, and box 
size a = 1. The curve corresponds to using the EBF along with N t j ) (9 n ) = 21 + 1; this choice 

greatly overestimates the size of the quadrature needed in the 4>-direction especially near the poles 
9 = {0, 7r}. In contrast, using our error estimate and scheme to choose the parameters, the error 
follows closely the target error. 



505 Section 3.3.4 is supported. 



1 rrmj 




1 10 100 1,000 10,000 

/V 

Fig. 3.9. The ratio of the number of quadrature points required in the Fourier based FMM 
and what would be used in a typical spherical harmonics based FMM for the same I. The curve 
asymptotes close to 2/ir i=s 0.64 as expected from Section 3.3-4 

506 Together, these results demonstrate that by choosing £ and the quadrature as 

so? presented in this paper, the error is controlled and the quadrature size is chosen 

5oa nearly optimally. The accurate error bounds derived in Section 3.3 means that we 

509 can provide a sharp bound of the total final error of the method and optimize the 

510 running time of the method for that prescribed error. A reduction in the quadrature 
5n size improves memory usage and suggests an improved running time over similar 
512 algorithms. 

so 3.5.2. Multi-Level Error. The previous section verified the error bounds de- 

514 rived for a single level. In this section, we show that the method performs as expected 

515 in the multilevel case as well. We considered two points distributed as in Figure 3.5, 

516 at opposite corners of the box. The transfer pass (M2L) is always done at the highest 

517 level in the tree (level 2). Then, the number of levels is increased, thereby adding 
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sis additional translation steps to the calculation. This test therefore checks that the 

519 Fourier based interpolation and anterpolation procedure (Section 3.4) does not affect 

520 the accuracy of the calculation. 

521 Figure 3.10 shows the convergence of the method as the target error is adjusted, 

522 thereby changing I and the quadrature size appropriately. We show £q (Gegenbauer 

523 truncation), (total error), and ef (interpolation error obtained as the difference 

524 between £q and Ej), for L = 2, . . . , 8 (L is the total number of levels). The quadrature 

525 size is the total number of quadrature points at the highest active level, with box size 

526 a,2 — 1. Note that for L — 2 there is only one active level and no translation step. 

527 On the same plot, we show the discrepancy between e\ (L = 3, . . . , 8) and e\ which 

528 corresponds to the error due to the upward and downward passes. We expect this 

529 error to be much smaller than the target error, since the target error accounts for the 

530 M2L operation and therefore the quadrature is over-estimated when considering only 

531 the translation functions. This is confirmed by Figure 3.10. The curve Eq is exactly 

532 the same for all L while has small variations with L due to the sampling of the 

533 translation function in the updaward and downward passes. 

534 We note that the method converges super-exponentially, following the rate of 

535 decay of the Jacobi- Anger series (3.11). 




26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 



Quadrature Size (xlOOO) 

Fig. 3.10. Log-log plot of two-level convergence with k = 100, a = 0.8, and highest active level 
box size 02 = 1. For each curve, we have 7 cases with L = 2, . . . , 8. This is denoted by 2-8 in 
the legend of the figure. The 7 cases are nearly undistinguishable except when the error is of order 
10 -12 and below. 

536 To further show that the numerical integral is converging to the Gegenbauer se- 

537 ries value, Figure 3.11 sets the Gegenbauer error to a constant and uses the target 

538 error to increase the quadrature size only. We observe that as the quadrature size 

539 increases the total error becomes exactly equal to the Gegenbauer series truncation 

540 error. For this case, we used two active levels for the MLFMM, therefore including 

541 the FFT interpolation and anterpolation stages. This validates numerically our the- 

542 oretical analysis and shows that the error caused by our anterpolation strategy and 

543 smoothing of the transfer function Tf (8, (f>) can be effectively controlled and behaves 

544 as expected. 

545 3.5.3. Speed. As discussed in Section 3.4, the Fourier based FMM uses only 

546 FFTs in the upward pass and downward pass to perform the interpolations and an- 

547 terpolations. FFTs make these steps easier to implement and very efficient. In addi- 
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Fig. 3.11. Log-log plot of two-level convergence to the truncated Gegenbauer series value with 
K = 100, a = l/v3, and highest active level box size 02 = 1. The convergence of the numerical 
quadrature for (0,4>) (red curves) should be compared with the convergence of the toy problem in 
Figure 3.2 (blue curve in Figure 3.2). 

548 tion, Section 3.3.4 and Figure 3.9 suggest that the quadrature used in this paper is 

549 significantly smaller than what is typically used for a spherical harmonic basis. 

550 All times reported are from repeated runs on a 2.93GHz Intel Core i7 Quad Core 

551 Processor 870 with 8MB Cache with 8GB of 1333MHz DDR3 RAM. 

552 First, we illustrate the efficient use of FFTs in the interpolation and anterpolation 

553 stages. Here, we compare the FFT interpolation method of Section 3.4 against the 

554 semi-naive exact spherical harmonic interpolation method which consists of a forward 

555 FFT in 0, a dense matrix-matrix product on the 6 angles, and a backward FFT in </>. 

556 This spherical harmonics method is analyzed and accelerated in [14, 7]. Figure 3.12 

557 confirms the expected asymptotic running times of each method. 




50 100 200 500 1,000 2,000 5,000 

Gegenbauer Truncation, t 



Fig. 3.12. Comparison of the FFT interpolation scheme of Section 3-4 [complexity 0(£ 2 logl)] 
with the semi-naive 0(t 3 ) spherical harmonic transform described in [14]- 

558 To show that the optimal asymptotic running time is achieved, Figure 3.13 shows 

559 the recorded running times of the Fourier based FMM and the direct matrix-vector 

560 product. The target error is set to 10 with a = 1 and is achieved in every case. For 
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561 N — 8.2 • 10 6 the points are uniformly distributed in a cube with side length 80A. The 

562 wave number k is scaled with JV 1 / 3 to provides a nearly constant density of points 

563 per wavelength as N is varied. As expected, by choosing the correct number of levels 

564 the running time is asymptotically O(N) as N is increased with a constant number 

565 of points per wavelength. Note that the cross-over point is N « 3, 000. 




(1K,23.9) (10K,51.4) (100K,111) (1M,239) (10M,514) 

(N,MHz) 

Fig. 3.13. Average running times of the Fourier based FMM for constant number of volumetric 
points per wavelength. By choosing the optimal number of levels, we achieve an O(N) complexity. 

566 4. Conclusion. We have proposed using the Fourier basis e tp ^e lqe in the spher- 

567 ical variables <j> and 9 to represent the far-field approximation in the FMM. By ap- 
ses proximating the Helmholtz kernel with Equation (1.1) and using a uniform quadra- 

569 ture we can take advantage of very fast, exact, and well-known FFT interpola- 

570 tion/anterpolation methods. By exploiting symmetries and a scheme to reduce the 

571 number of points in the ^-direction, the total number of uniform quadrature points re- 

572 quired is smaller than the number of Gauss-Legendre quadrature points typically used 

573 with spherical harmonics. This is realized by removing the high frequency compo- 

574 ncnts of the modified transfer function, Tf (0,<p), during the precomputation phase 

575 which do not significantly contribute to the final integration. 

576 The Fourier based FMM approach has a number of advantages. Since the inter- 

577 polation and anterpolation algorithms are exact, the error analysis is simplified; we 

578 establish a sharp upper bound for the error. The key parameters are the Gegenbauer 

579 truncation parameter I and the quadrature size, in particular the sampling rate in the 
sac 0-direction. The truncation error eg has been extensively studied by other authors 

581 and is well understood. The integration error £j accounts for the bandlimited approx- 

582 imation of the modified transfer function and the finite sampling of the plane-waves. 

583 This error can be accounted for a priori during the precomputation stage. Numerical 

584 tests have confirmed that this error analysis is quite sharp. Constructive algorithms to 

585 find nearly optimal parameters were proposed. Since highly efficient FFT algorithms 
sse are available in virtually every computing environment, the time-critical interpolation 
587 stages of the algorithm are much easier to implement efficiently. 
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